Column {.tabset}
Social Media Usage Analysis
Scot Swanson
Row {.tabset}
This project investigates social media usage metrics, focusing on engagement patterns across various platforms. By analyzing metrics such as daily time spent, posts, likes, and follows, we aim to gain insights into user behavior and engagement trends across platforms like Instagram, Facebook, and Twitter.
Social media has become an integral part of daily life, influencing communication, culture, and commerce. Analyzing social media engagement helps companies and researchers understand user behavior, optimize platform content, and potentially improve user experience. This study is significant because it sheds light on the factors contributing to higher engagement, which can benefit marketers, advertisers, and platform developers.
This analysis aims to address the following research questions: - Which social media platform has the highest average daily usage among users? - What is the relationship between likes and follows across different platforms? - How does time spent correlate with follows per day?
The dataset used for this project is publicly available on Kaggle: Social Media Usage Dataset. This dataset includes detailed metrics on daily social media activity, covering platforms such as Instagram, Facebook, and Twitter. These metrics provide insights into posts, likes, follows, and time spent on each platform, allowing us to answer our research questions.
User_ID App Daily_Minutes_Spent Posts_Per_Day Likes_Per_Day
1 U_1 Pinterest 288 16 94
2 U_2 Facebook 192 14 117
3 U_3 Instagram 351 13 120
4 U_4 TikTok 21 20 117
5 U_5 LinkedIn 241 16 9
6 U_6 Twitter 464 3 137
Follows_Per_Day Engagement
1 0 110
2 15 146
3 48 181
4 8 145
5 21 46
6 30 170
To prepare the data for analysis, we followed these data cleaning steps: - Loaded the Data: Read data from a CSV file. - Removed Missing Values: Filtered out rows with missing data to ensure clean analysis. - Feature Engineering: Created a new “Engagement” variable by summing the daily counts of posts, likes, and follows. This provides an overall metric for user activity.
Row {.tabset}
Row {.tabset}
Row {.tabset}
The analysis reveals interesting insights into social media usage across platforms: - Platform with Highest Engagement: Based on total engagement, we can see which platform has the highest user activity. - Likes and Follows Relationship: There is a visible relationship between likes and follows, indicating how social connections influence platform engagement. - Time Spent and Follows: Our analysis shows how daily minutes spent on platforms correlates with follows per day, providing insights into user interactions.
While the analysis provides valuable insights, there are limitations to consider: - Dataset Scope: The data may not capture all social media platforms or all types of user interactions. - Engagement Calculation: Our engagement metric is a simple sum and may not reflect nuanced interactions or platform-specific dynamics.
Future studies could include: - Time-Series Analysis: Explore how engagement metrics change over time. - User Demographics: Analyze engagement patterns by user demographics to gain targeted insights.
---
title: "Social Media Usage Analysis"
author: "Scot Swanson"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
theme:
bootswatch: zephyr
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(plotly)
library(DT)
library(ggplot2)
```
# Title Page
Column {.tabset}
### Project Title
**Social Media Usage Analysis**
### Team Members
Scot Swanson
# Introduction
Row {.tabset}
### Project Overview
This project investigates social media usage metrics, focusing on engagement patterns across various platforms. By analyzing metrics such as daily time spent, posts, likes, and follows, we aim to gain insights into user behavior and engagement trends across platforms like Instagram, Facebook, and Twitter.
### Research Background and Significance
Social media has become an integral part of daily life, influencing communication, culture, and commerce. Analyzing social media engagement helps companies and researchers understand user behavior, optimize platform content, and potentially improve user experience. This study is significant because it sheds light on the factors contributing to higher engagement, which can benefit marketers, advertisers, and platform developers.
### Research Questions
This analysis aims to address the following research questions:
- Which social media platform has the highest average daily usage among users?
- What is the relationship between likes and follows across different platforms?
- How does time spent correlate with follows per day?
### Data Source and Collection
The dataset used for this project is publicly available on Kaggle: [Social Media Usage Dataset](https://www.kaggle.com/datasets/bhadramohit/social-media-usage-datasetapplications). This dataset includes detailed metrics on daily social media activity, covering platforms such as Instagram, Facebook, and Twitter. These metrics provide insights into posts, likes, follows, and time spent on each platform, allowing us to answer our research questions.
# Data Loading and Cleaning
```{r}
# Load the social media dataset
data <- read.csv("social_media_usage.csv")
# Data Cleaning Steps:
# 1. Remove any rows with missing values for cleaner analysis
data <- na.omit(data)
# 2. Feature Engineering: Create an "Engagement" variable to represent daily user activity as the sum of posts, likes, and follows
data <- data %>%
mutate(Engagement = Posts_Per_Day + Likes_Per_Day + Follows_Per_Day)
# Display the first few rows of the cleaned data for verification
head(data)
```
### Detailed Data Cleaning and Manipulation Process
To prepare the data for analysis, we followed these data cleaning steps:
- **Loaded the Data**: Read data from a CSV file.
- **Removed Missing Values**: Filtered out rows with missing data to ensure clean analysis.
- **Feature Engineering**: Created a new "Engagement" variable by summing the daily counts of posts, likes, and follows. This provides an overall metric for user activity.
# Summary Statistics
Row {.tabset}
### Distribution of Daily Minutes Spent
```{r}
ggplot(data, aes(x=Daily_Minutes_Spent)) +
geom_histogram(bins=20, fill="#377eb8", color="black", alpha=0.8) +
labs(title="Distribution of Daily Minutes Spent on Social Media",
x="Daily Minutes Spent", y="Frequency") +
theme_minimal() +
theme(plot.title = element_text(size=16, face="bold"),
axis.title = element_text(size=14),
axis.text = element_text(size=12))
```
### Average Daily Usage by Platform
```{r}
avg_usage <- data %>%
group_by(App) %>%
summarize(avg_daily_minutes = mean(Daily_Minutes_Spent)) %>%
arrange(desc(avg_daily_minutes))
ggplot(avg_usage, aes(x=reorder(App, -avg_daily_minutes), y=avg_daily_minutes)) +
geom_bar(stat="identity", fill="#4daf4a", color="black") +
labs(title="Average Daily Minutes Spent by Platform",
x="Platform", y="Average Daily Minutes") +
theme_minimal() +
theme(plot.title = element_text(size=16, face="bold"),
axis.title = element_text(size=14),
axis.text = element_text(size=12),
axis.text.x = element_text(angle = 45, hjust = 1))
```
# Correlation Analysis
### Correlation Matrix of Engagement Metrics
```{r}
correlation_matrix <- cor(data %>% select(Daily_Minutes_Spent, Posts_Per_Day, Likes_Per_Day, Follows_Per_Day, Engagement))
plot_ly(
z = ~correlation_matrix,
x = colnames(correlation_matrix),
y = rownames(correlation_matrix),
type = "heatmap",
colorscale = "Viridis"
) %>%
layout(title = "Correlation Matrix for Social Media Usage Metrics",
titlefont = list(size = 16))
```
# Exploration
Row {.tabset}
### Engagement Across Platforms
```{r}
engagement_by_platform <- data %>%
group_by(App) %>%
summarize(total_engagement = sum(Engagement)) %>%
arrange(desc(total_engagement))
ggplot(engagement_by_platform, aes(x=reorder(App, -total_engagement), y=total_engagement)) +
geom_bar(stat="identity", fill="#FF7F0E", color="black") +
labs(title="Total Engagement Across Social Media Platforms",
x="Platform", y="Total Engagement") +
theme_minimal() +
theme(plot.title = element_text(size=16, face="bold"),
axis.title = element_text(size=14),
axis.text = element_text(size=12),
axis.text.x = element_text(angle = 45, hjust = 1))
```
### Time Spent vs. Follows Per Day
```{r}
ggplot(data, aes(x=Daily_Minutes_Spent, y=Follows_Per_Day, color=App)) +
geom_point(size=3, alpha=0.6) +
labs(title="Correlation Between Time Spent and Follows Per Day",
x="Daily Minutes Spent", y="Follows Per Day") +
theme_minimal() +
theme(plot.title = element_text(size=16, face="bold"),
axis.title = element_text(size=14),
axis.text = element_text(size=12))
```
# Discussion
Row {.tabset}
### Key Findings and Analysis
The analysis reveals interesting insights into social media usage across platforms:
- **Platform with Highest Engagement**: Based on total engagement, we can see which platform has the highest user activity.
- **Likes and Follows Relationship**: There is a visible relationship between likes and follows, indicating how social connections influence platform engagement.
- **Time Spent and Follows**: Our analysis shows how daily minutes spent on platforms correlates with follows per day, providing insights into user interactions.
### Limitations
While the analysis provides valuable insights, there are limitations to consider:
- **Dataset Scope**: The data may not capture all social media platforms or all types of user interactions.
- **Engagement Calculation**: Our engagement metric is a simple sum and may not reflect nuanced interactions or platform-specific dynamics.
### Future Work
Future studies could include:
- **Time-Series Analysis**: Explore how engagement metrics change over time.
- **User Demographics**: Analyze engagement patterns by user demographics to gain targeted insights.
# References
- Kaggle. Social Media Usage Dataset. Available at: [https://www.kaggle.com/datasets/bhadramohit/social-media-usage-datasetapplications](https://www.kaggle.com/datasets/bhadramohit/social-media-usage-datasetapplications)
- Tidyverse Documentation. Comprehensive R Packages for Data Science.
- Plotly Documentation. Interactive Plots in R.